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Abstract 

The visual perception of form information is considered to be based on the function- 
ing of simple and complex neurons in the primate striate cortex. However, a review o 
the physiological data on these brain cells cannot be harmonized with either the per- 
ceptual spatial frequency performance of primates or the performance which is neces- 
sary for form perception in humans. This discrepancy together with recent interest in 
cortical-like and perceptual-like processing in image coding and machine vision 
prompted a series of image processing experiments intended to provide some defini- 
tion of the selection of image operators. The experiments were aimed at determining 
operators which could be used to detect edges in a computational manner consistent 
with the visual perception of structure in images. Fundamental issues were the selec- 
tion of size and circular versus oriented operators (or some combination). In a 
previous study, circular difference-of-Gaussian (DOG) operators, with peak spatia 
frequency responses at about 1 1 and 33 cyc/deg, were found to capture the primary 
structural information in images. Here larger scale circular DOG operators were 
explored and led to severe loss of image structure and introduced spatial dislocations 
in structure which is not consistent with visual perception. Orientation sensitive 
operators (akin to one class of simple cortical neurons) introduced ambiguities of edge 
extent regardless of the scale of the operator. For machine vision schemes which are 
functionally similar to natural vision form perception, two circularly symmetric very ig 
spatial frequency channels appear to be necessary and sufficient for a wide range o 
natural images. Such a machine vision scheme is most similar to the physiological 
performance of the primate lateral geniculate nucleus rather than the stnate cortex. 


Introduction 

Vision is central to most human activity and the visual perception of form is the most 
central of visual skills. There is an interest in the development of a machine vision 
system which is functionally similar to natural vision. This system would have to 
include image acquisition, extraction of form information, "visual learning and mem- 
ory" and recognition and interpretation. This paper is concerned only with the first two 
elements in this system and the use of natural vision concepts to influence the design 

of this subsystem. , . , .. . .. 

The perception of form is widely considered to be based in the functioning of the 
simple and complex neurons of the striate cortex. For the design of a machine vision 
system the spatial responses of these cells are of considerable interest, both for 
imaqe coding and information extraction. Here a review of physiological data on cor- 
tical neurons and perceptual data provides a starting point for examining fundamental 
issues of the type(s) and size(s) of edge detection operators that are necessary and 
sufficient for extracting form information from arbitrary images. Equivalently for linear 
systems this amounts to a definition of the spatial frequency channels necessary or 



edge detection performance that is consistent with the visual perception of primary 
structure in images. This can be thought of as a question of what spatial frequency 
channels carry the most important form information. Here the selection of edge detec- 
tion operators is limited to those occurring in early primate spatial vision and specific 
configurations which produce zero crossings and, more specifically, only one zero 
crossing per edge event. Within these limitations, a variety of edge detection 
operators are applied to determine consistency with visual perception. 


Physiological and Perceptua l Spatial Frequency 
Performance of Fovea! Prima te Vision 

A review of primate physiological and perceptual data was conducted in order to 
provide insight into natural vision mechanisms of form perception. A significant dis- 
crepancy emerged when the collective spatial frequency response of foveal striate 
cortical neurons (Ref. 1) is compared to the perceptual spatial frequency response 
(Ref. 2) of the same primate (Fig. 1). The perceptual curve is also representative of the 
human response to luminance sine wave gratings (Ref. 1). With respect to form per- 
ception, this discrepancy is particularly disturbing because of the rather coarse spatial 
resolution of the bulk of the cortical units. The degree of the discrepancy at high spa- 
tial frequencies was masked by the original data being plotted on log-log scales (a 
convention in natural vision science) which heavily compress the high spatial fre- 
quency portion of the scale. Further cortical data was assembled (Fig. 2 and Ref. 3-7) 
in hopes of finding higher spatial frequency units. This did not prove to be the case. In 
terms of spatial resolution and coverage of the retinal image, high spatial frequency 
units should not only be present but also occur in populations that should increase as 
the square of spatial frequency. This persistent discrepancy led to a retreat backward 
in the visual pathway to examine the physiology of the lateral geniculate nucleus 
(Fig. 3 and Ref. 8). Here the high spatial frequency units are indeed present. This 
encourages the belief that any omission of high acuity channels in the striate cortex 
data is not due to experimental measurement limitations. However, there is still a 
paucity of numbers of units peaking at a spatial frequency near the limit set by the 
retinal mosaic of photoreceptors (~ 33 cyc/deg). Subsequently. I will illustrate that this 
channel at 33 cyc/deg does seem to be necessary for human form perception. For 
now it is sufficient to state that the thinness or complete absence of high spatial 
frequency units raises serious questions about how to use physiology as a source for 
defining machine vision methods aimed at computational form perception. The edge 
detection operations necessary for this are now explored via image processing 
experiments which will shift through low spatial frequencies to high and examine the 
issue of one circular operator per scale versus multiple orientation sensitive operators 
per scale. 


Rasir; Spatial Response s of Primate Foveal Vision 
for Use in Zero Crossi ng Edge Detection 

Since neither the spatial layout of primate receptive fields nor other fundamental 
aspects of edge detection in primates are known, the liberty must be taken of "playing" 


2 



with what is known in image processing experiments. The intent here is not to perform 
an exact simulation of natural vision but rather to gain insight into computational rneth- 
ods which can produce a result consistent with visual perception. In no way can this 
definition of methods be considered to be a scientific model for natural vision pro- 
cesses because primate vision could use an entirely different means of achieving the 
same end. We can only attempt to start out in some reasonable physiological 
manner and end up with some reasonable perceptual result. 

The perception of visual structure is predominantly accurate locational determina- 
tions of edge boundaries. Locational accuracy is paramount because such high acuity 
tasks as reading require locational accuracies approximately equivalent to the original 
image sampling grid of the photoreceptor mosaic (= 0.015 degree sample cells). A 
direct and computationally concise method of detecting edge locations is a zero 
crossing determination (Ref. 9). Not all of the spatial responses found in early primate 
vision are useful for this. Retinal receptive fields do not eliminate the zero spatial fre- 
quency response and therefore don't produce exact zero crossings locations t a are 
independent of the actual image intensity values. Complex cortical neuron responses 
are apparently highly multilobed and introduce a correspondence problem, i.e. 
multiple zero crossings for one edge event. At the other extreme, one class of simp e 
cortical neurons, one ridge and one valley, produces no zero crossing at all. This 
process of elimination narrows the types of spatial responses to the two 'Shown (Fig. 4). 
Details of the specifics of edge detection methods are provided in Ref. 10 and 1 1 . 

For visual comparisons throughout this paper, image size on the printed page is 
selected to make the size of each image element equal to the visual acuity limit 
(= 0.015 degrees) for a comfortable reading distance of about 20 inches. This in effect 
spatially calibrates visual perception to computational processes. Contrast rendition of 
these images as published is not particularly accurate; however, most of the percep- 
tual content of the images is retained and most of the defects in edge detection are 
sufficiently glaring that highly accurate contrast rendition is unnecessary. 


Edge Detection si 2 Cvcles/Dearee 

The spatial frequency range centered on 3 cycles/degree is the collective operating 
range of the bulk of measured cortical neurons considered to be responsible for form 
perception and is therefore of initial interest in comparing edge detection experiments 
with visual perception. 


A. Circular DOG Operator 

With the calibration of computation to perception of one image element corre- 
sponding to 0.015 degrees (= 33 cycles/deg), a DOG operator at 3 cycles per de 9 ree 
has a center diameter of 1 1 image elements. The visual effect of this amount of blur is 
illustrated by convolving an original image with a Gaussian blur function whose circu- 
lar full width at half of maximum value is 1 1 image elements (Fig. 5). Examples of 
edqe representations for the 3 cycle/degree DOG operator (Fig. 6) consistently show 
little or no form information left after this amount of blurring. Cases where an impres- 
sion of some form information is given, suffer from serious spatial dislocation from true 
edge locations in original image space. These results do not convey any convincing 
feeling that edge detection at 3 cyc/deg is the basis for form perception. 
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B. Oriented Operators at 3 Cycles/Degree 

Before shifting from 3 cyc/deg to higher spatial frequency operators, we can exam- 
ine operators with an orientation axis to see if this very cortical character for an opera- 
tor provides some better representation than the circular case. One vertical and one 
horizontal ridge-with-valleys operator are used. The merged results (Fig. 7) illustrate a 
problem which may be fundamental to oriented operators, i.e., an ambiguity in edge 
extent and location for edges with geometrical shape at the scale of the operator. This 
problem appears to arise from the locational uncertainty introduced along the orienta- 
tion axis simply by making the operator have an oriented response. This ambiguity 
will reappear at higher spatial frequencies when oriented operators are used. In any 
event, there seems to be no pathway to the capture of form information using either 
type of operator at 3 cycles/degree. 


Frinp Detection - Circular Versus Oriented Operatorsal 
33 Ovdes/Degree in a High Acuity V isual Task 

The reading of printed text is a high acuity visual task since the narrowest linewidths 
of lettering for most print occupies a visual angle of = .015 degrees. Edge detection of 
printed text is then a reasonable test for high spatial resolution comparison with visual 
perception. The well defined finely detailed character of letters exposes flaws in edge 
representations readily. For this experiment the operators used are shown in Table 1 . 
The operator for the circular DOG has been shown to produce the equivalent to a con- 
volution of the original scene intensity distribution with a circular DOG by making use 
of the Gaussian blur of the image forming optics (Ref. 10). The oriented operator is 
intended to be a crude approximation to the smallest possible ridge with two valleys. 
The actual continuous character of the operator in the scene domain has not been 
investigated. A vertical and horizontal operator are again used for initial visual 
assessment. Various contrast thresholds for edge detection are explored (Fig. 8). No 
particular threshold can be found which eliminates the "cross stitch" artifacts while 
retaining all edges whose orientation matches that of the operator. Perhaps a fuller 
set of orientation would improve performance but fundamentally the oriented operators 
confound edge extent with edge contrast in a manner which is not clearly resolvable. 
This difficulty is not encountered with the circular DOG where locational uncertainties 
are equal and can be minimized in all directions at once by choice of diameter. The 
case of obtaining an accurate edge representation with the circular DOG in the printed 
text case (Fig. 9) is obtainable over a wide range of edge detection contrast thresholds 
while the best result with the oriented operator does not approach the accuracy of the 
visual perception of printed text. 


Recapitulation of Previous Results - Edge Detection and Contour Informatio n 
Fytraction at 1 1 and 33 Cvcies/De aree Using Circula r DOG Operatio n 

Experimental results of edge detection and contour information extraction (Ref. 1 1 ) 
are revisited to provide a more comprehensive perspective on the visual information 
content of different spatial frequency channels within the overall human-primate 
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spatial frequency response. In the previous work primary contour bearing information 
channels (Fig. 10) were found to be the highest spatial frequency channels, 11 and 33 
cycles/deg. Further, the 1 1 cyc/deg channel is a higher contrast subset of all image 
phenomena visible at 1 1 cyc/deg. Contour extraction methods were based on the 
combined visual significance of relatively high contrast, minimum degree of connec- 
tivity and limitations on maximum spatial density of edge events. The two channels 
are merged by giving the 33 cyc/deg channel priority in any local rivalry. The resulting 
contour images (Fig. 1 1 ) do differ in some particulars from those of Ref. 1 1 . This 
reflects some changes in contour extraction methods in the interim. 


Perceptual Evidence for a 3 3 Cvcle/Deorees ChanpgJ 

The idea, that two high spatial frequency channels with complementary spatial res- 
olution and contrast sensitivity are necessary and sufficient basis for human-like form 
perception in some machine implementation, is considered further. Clearly, the higher 
contrast sensitivity - coarse spatial resolution channel is essential to capture >ow er 
contrast phenomena that make up much of perceived structure in images But the 
need for the 33 cyc/deg channel is not so obvious. The geometncal fidelity of detected 
edges at 1 1 cyc/deg and 33 cyc/deg for a image of printed text is now considered 

(Fia 12 - 14 ) in relation to visually perceived outlines of letters. The original image is 
shown side by side with the original image blurred by an amount approximately equal 
to the 11 cyc/deg channel. Below the images are the detected edge representations 
for the 33 and 1 1 cyc/deg channels respectively. The 1 1 cyc/deg representation is rich 
in geometrical defects not found in the visual perception of the original image. These 
defects are especially obvious in the details (Fig. 13-14) of the edge representations. 
Therefore, it is necessary to require a 33 cyc/deg channel in order to duplicate the 
geometrical fidelity present in the visual perception of high contrast finely detailed 
image phenomena. 


Discussion 

Taken as a whole, the results herein prompt speculation about the role of oriented 
response simple neurons in primate vision. Considering both the coarse spatial res- 
olution and the extent ambiguities encountered in edge detection expenments, these 
neurons seem more amenable to encoding shading information or perhaps connect- 
ing shading information to contour information derived from other classes of neurons 
(perhaps the parvocellular subsystem of the LGN). The apparent absence of high 
visual acuity responses in the striate cortex is acutely disturbing since units of this type 

should be present in very large numbers. 

For machine vision, these results argue strongly against the use of oriented 
responses especially when coupled with the unavoidable computational complexity of 
such a family of operators. Further, the circular operator seems capable of producing 
all information that oriented responses are capable of as well as accurate locational 
definition of corners, tight loops, and other high acuity image phenomena which are 
not captured easily or at all by oriented responses. Naturally the onentation of 
extended straight or moderately curved edges can be computed if needed from the 
detected edge loci of circular DOG operators. 
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Table 1. Operators for Edge Detection Experiments 
at 33 Cycles/Degrees 
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Fig. 1 - Comparison of Collective Physiological with Perceptual 
Spatial Frequency Responses for Foveal Primate Vision 


8 


G/cntrst snstvty |90-1 268 


Contrast sensitivity 



Spatial frequency, cycles/deg 


Fig. 2 - Further Physiological Data on Foveal Cortical Neurons - 
Range of Peak Spatial Frequency Responses. 
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Fig. 3 - Physiological Data on 
Geniculate Nucleus - I 
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Fig. 4 - Spatial Responses in Primate Vision Used for 
Zero-Crossing Edge Detection Experiments 
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Figure 5. Visual Effect of Blurring That is Approximately 
Equivalent to a 3 Cyc/Deg Operator 
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Figure 6. Results - Edge Detection at 3 Cyc/Deg Using 
Circular DOG Operator 
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Figure 6. Continued 
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Figure 6. Continued 
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Figure 6. Continued 
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Two different contrast thresholds for edge detection. 


Figure 7. Results - Edge Detection at 3 Cyc/Deg Using 
Vertical and Horizontal Operators 
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Figure 8. Results - Edge Detection at 33 Cyc/Deg in a 
High Acuity Task Using Vertical and Horizontal 
Operators for a Range of Contrast Thresholds 
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Circular DOG 



Figure 9. Performance Comparison of Circular and Oriented 
Operators at 33 Cyc/Deg 
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Contrast sensitivity 



Fig. 10 - Contour Information Channels 
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Figure 11. Recapitulation of Merged Results of Edge Detection 
at 1 1 Cyc/Deg and 33 Cyc/Deg with Additional 

Processing for Contour Extraction 
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b) Edge representation - 1 1 cyc/deg DOG 


Figure 12. The Perception of Printed Text as Evidence 
of the Need for a 33 Cyc/Deg Channel 
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Figure 13. Details of Printed Text Edge Detection Results 
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Figure 14. Additional Details of Figure 12 a) & b) 
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